Fast greedy algorithm for subspace clustering from corrupted and incomplete data
نویسندگان
چکیده
We describe the Fast Greedy Sparse Subspace Clustering (FGSSC) algorithm providing an efficient method for clustering data belonging to a few low-dimensional linear or affine subspaces. The main difference of our algorithm from predecessors is its ability to work with noisy data having a high rate of erasures (missed entries with the known coordinates) and errors (corrupted entries with unknown coordinates). The greedy approach consists in usage of a basic algorithm on data out of the domain of the basic algorithm reliability. When the algorithm fails and its output looses some desirable features of the solution, usually it still brings some information about ”the largest features” of the solution. The greedy algorithm extracts those features and uses them in an iterative way by launching the basic algorithm with the additional information mined out on the previous iterations. Such scheme requires an additional time consuming algorithm loop. We discuss here how to implement the fast version of the greedy algorithm with the maximum efficiency whose greedy strategy is incorporated into iterations of the basic algorithm. We provide numerical evidences that, in the subspace clustering capability, the fast greedy algorithm outperforms not only the existing state-of-the art SSC algorithm taken by the authors as a basic algorithm but also the recent GSSC algorithm. At the same time, its computational cost is only slightly higher than the cost of SSC. The numerical evidence of the algorithm significant advantage is presented for a few synthetic models as well as for the Extended Yale B dataset of facial images. In particular, the face recognition misclassification rate turned out to be 6–20 times lower than for the SSC algorithm. We provide also the numerical evidence that the FGSSC algorithm is able to perform clustering of corrupted data efficiently even when the sum of subspace dimensions significantly exceeds the dimension of the ambient space.
منابع مشابه
A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data
The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...
متن کاملFunctional Subspace Clustering with Application to Time Series
Functional data, where samples are random functions, are increasingly common and important in a variety of applications, such as health care and traffic analysis. They are naturally high dimensional and lie along complex manifolds. These properties warrant use of the subspace assumption, but most state-of-the-art subspace learning algorithms are limited to linear or other simple settings. To ad...
متن کاملSpeech Enhancement Through an Optimized Subspace Division Technique
The speech enhancement techniques are often employed to improve the quality and intelligibility of the noisy speech signals. This paper discusses a novel technique for speech enhancement which is based on Singular Value Decomposition. This implementation utilizes a Genetic Algorithm based optimization method for reducing the effects of environmental noises from the singular vectors as well as t...
متن کاملFast and Robust Subspace Clustering Using Random Projections
Over the past several decades, subspace clustering has been receiving increasing interest and continuous progress. However, due to the lack of scalability and/or robustness, existing methods still have difficulty in dealing with the data that possesses simultaneously three characteristics: high-dimensional, massive and grossly corrupted. To tackle the scalability and robustness issues simultane...
متن کاملSpeech Enhancement Through an Optimized Subspace Division Technique
The speech enhancement techniques are often employed to improve the quality and intelligibility of the noisy speech signals. This paper discusses a novel technique for speech enhancement which is based on Singular Value Decomposition. This implementation utilizes a Genetic Algorithm based optimization method for reducing the effects of environmental noises from the singular vectors as well as t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1306.1716 شماره
صفحات -
تاریخ انتشار 2013